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Regularized estimation of linear functionals of 
precision matrices for high-dimensional time 
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Abstract 

This paper studies a Dantzig-selector type regularized estimator for linear functionals of high¬ 
dimensional linear processes. Explicit rates of convergence of the proposed estimator are obtained and 
they cover the broad regime from i.i.d. samples to long-range dependent time series and from sub- 
Gaussian innovations to those with mild polynomial moments. It is shown that the convergence rates 
depend on the degree of temporal dependence and the moment conditions of the underlying linear 
processes. The Dantzig-selector estimator is applied to the sparse Markowitz portfolio allocation and 
the optimal linear prediction for time series, in which the ratio consistency when compared with an 
oracle estimator is established. The effect of dependence and innovation moment conditions is further 
illustrated in the simulation study. Finally, the regularized estimator is applied to classify the cognitive 
states on a real fMRI dataset and to portfolio optimization on a financial dataset. 


I. Introduction 

Multivariate time series data arise in a broad spectrum of real applications. Let Xj,i G Z, be 
a p-dimensional stationary time series with mean fi and covariance matrix S = cov(xj). Given 
the sample Xj,i = 1,... ,n, we consider estimation of linear functionals of the form 6 = 
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where b is a p x 1 veetor. Sueh funetionals appear in Markowitz Portfolio (MP) alloeation, 
linear diseriminant analysis (LDA), beamforming in array signal proeessing, best linear unbiased 
estimator (BLUE) and optimal linear predietion for univariate time series. See [|T1|, [|3, [[3l, [ill, 
whieh all ean be formulated as solutions of the general linear equality eonstrained quadratie 
programming (QP) problem 

minimizeweRp w^Sw subjeet to w^b = m. (1) 

It is elear that the solution is w* = mS”^b/(b^S“^b) oc 0 and value of ([^ is m^/(b^S ^b). 

To estimate 0, traditional approaehes take two steps: (i) an estimate S of S is eonstrueted 
and (ii) estimate 6 using S”^b or S~^b if b is unobserved. Although the two-step estimator is 
asymptotieally eonsistent for 6 in the elassieal fixed and low dimensional ease, it may no longer 
work in high dimensions. First, eonsistent estimation of S or its inverse is a ehallenging problem 
in the high-dimensional setting. Under sparseness or other struetural eonditions on S or 
researehers studied regularized eovarianee matrix estimators ||51, [HI, 0, 0, [(H, ifTOl . ifTTl . 
lEl, [fT3l . [1X41 . [HSl . [|T6l. Without sueh struetural eonditions it is unelear how one ean obtain 
a eonsistent estimator. Seeond, eonsistent estimation of S or its inverse does not automatieally 
imply eonsisteney of S“^b or S“^b sinee |b |2 = Vb^b may also inerease with the dimension 
p. Indeed, to estimate 0 by a ”plug-in” method 6 = S“^b, we ean only get in the worst ease 
\6 — 0\2 = |S“^b — S“^b |2 < — S“^)|b| 2 , where p is the speetral norm. If |b |2 diverges 

to infinity at a faster rate than then the plug-in estimator does not eonverge. 

Direet estimation for funetionals of eovarianee matriees is studied in IfTTl . [flSl . [fT^ . [[20l . 
m, m among others for independent and identieally distributed (i.i.d.) data. Allowing serial 
dependenee, ll23l established an asymptotie theory for sparse eovarianee matrix estimators. That 
work, however, does not direetly deal with estimating the linear funetional 6 and it ean only 
handle weakly temporal dependent proeesses. It rules out many interesting applieations sueh as 
long memory or long-range dependent time series in the fields of hydrology, network traffie, 
eeonomies and finanee ( [[24l . [[25l . |[2^ . [[27l f. 

In this paper we shall foeus on direet estimation of 6 for both short- and long-range dependent 


2 




( 2 ) 


times series. Here we assume that (xj) has the form of veetor linear proeess 

OO 

Xj = ^ 

m=0 

where ^ is the mean veetor, Am are p x p eoeffleient matriees, - 

ijgz are i.i.d. random variables (a.k.a. innovations) with zero mean and unit varianee. To 
develop high-dimensional asymptoties, following the setting in Seetion 2.4.2 in Il28l . we shall 
deal with the triangular array of observations of pfc-dimensional veetors i = 1,... ,nfc, 

k = 1,2,..., with min(nfc,pfc) —)■ oo. Hereafter for notational simplieity, we omit the subseript 
k and the asymptotie relation is referred to as min(n,p) —>■ oo. 

Veetor linear proeess is a flexible model in that Am eaptures both the spatial and temporal 
dependenees. The deeay rate of Am (see Q) is assoeiated to temporally weakly and temporally 
strongly dependent, both of whieh we shall deal with. An important speeial ease of ([^ is the 
stationary Gaussian proeess. Another example is the veetor auto-regression (VAR) model 

Xi = +... ++ (3) 

where Bi,. ■ ■, Bd are eoeffleient matriees sueh that Q has a stationary solution. The above model 
is widely used in eeonomies and finanee [l29l . Il30l . [|3T]| . [|32l . [l33ll . Reeent developments have 
been made in the estimation and sparse reeovery of the VAR model under high dimensionality 
l(34l . [l35l . [I^ . Ii371l . [l38l . The linear proeess model Q is quite flexible to inelude: (i) long-range 
dependenee (LRD); (ii) non-Gaussian distributions with possibly heavy-tails. In the network 
traffle analysis [l25ll . it is well-reeognized that: (i) is the Joseph effect, i.e. the degree of self¬ 
similarity; and (ii) is the Noah effect, i.e. the heaviness of the tail. In addition, those eoneems 
are also amenable to a large body of other real applieations in finaneial, eeonomie, as well 
as biomedieal engineering sueh as the funetional Magnetie Resonanee Imaging (fMRI) and 
mieroarray data [[39l . fl40l where the signal-to-noise ratio ean be low. 

A. Method and key assumptions 

We propose the following Dantzig-type llTO . estimator 

0 ■= 0(A) = argminy^gRj, jlr/li : |5„77-b|oo < a| , (4) 
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where b is an estimator of b and S'„ is the sample eovarianee matrix. If b is known, then we ean 
simply use b = b. If the mean veetor is known, Sn = n~^ otherwise 

Sn = n~^ Compared with the two-step methods, the estimate 0 in (j4) has 

two advantages in terms of both theory and eomputation. First, sinee 0 is a p x 1 veetor, there 
are only p parameters to estimate. Rate of eonvergenee for 0 in W ean be obtained under very 


general temporal dependenee and mild moment eonditions; see Theorems II. 1-II.3 in Seetion 
Seeond, 0 ean be reeast as an augmented linear program (LP) 


II-A 


mmimizeugKP,r/eRp 
subjeet to 


p 


j=i 


Vj<Uj, Vj<Uj, Vj = l,- 

sjri + bk < A, 

bk<\, Wk = 1,- ■ ■ ,p, 


,P, 


^k'n 


where is the fc-th eolumn of 5„. Let (u, fj) be a solution of the LP; then 0 = fj. There are 
eomputationally effieient off-the-shelf LP solvers to obtain numerieal values of 0 for large-seale 
problems. Our estimate and the equivalent LP is similar to the CLIME estimate [|22l . where b 
is ehosen to be the fixed Euelidean basis veetors. 

Now, we state our key assumptions and diseuss their implieations. First, we need to impose 
eonditions on the temporal dependenee. Write = iam,jk)i<j,k<p', let Cq G (0, oo) be a finite 
eonstant. We assume that the linear proeess satisfies the deeay eondition 

p 

max \ = max < C'o(l V (5) 

-J-P -3-P 

for all m > 0, where (3 > 1/2 and l^mj l is the norm of the j-th row of Am- If /3 > 1, ([^ 
implies short-range dependenee (SRD) sinee the auto-eovarianee matriees = X]m=o 
are absolutely summable. On the other hand, if 1 > (3 > 1/2, then (xj) in Q may not have 
summable auto-eovarianee matriees, thus allowing long-range dependenee (LRD). The elassieal 
literature on LRD primarily foeuses on the univariate ease p = 1. 

Next, we shall speeify the tail eonditions on the innovations ^ij. We say that is sub- 
Gaussian if there exists f > 0 sueh that Eexp(f^i < oo, or equivalently, there exists a eonstant 
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Q < oo such that 


II6.1II, := [Edft.il'')]''''' < (6) 

holds for all g > 1. A slightly weaker version is the (generalized) sub-exponential distribution. 
Let a > 1/2. Assume that for some f>0, Eexp(f|ei,i|i/“)< cx), or 

||Ci,illg ^ holds for all g > 1. (7) 

Equivalently, for all x > 0, P(|^i,i| > x) < Ciexp(—C 2 X^/“) holds for some Ci,C '2 > 0. In the 
study of veetor autoregressive processes, the issue of fat tails ean frequently arise [l^ and it 
ean affect the validity of the assoeiated statistical inference. In this paper we shall also consider 
the ease in whieh only has finite polynomial moment: there exists a g > 1 sueh that 

11^1,lllg < CX). (8) 

The tail distribution eondition (or equivalently the moment condition) severely affeets rates of 
eonvergenee of various covariance matrix estimates. As a primary goal of this paper, we shall 
develop an asymptotic theory for convergence rates of linear functional estimates with various 
levels of temporal dependenee and for innovations having sub-Gaussian (ineluding bounded and 
Gaussian as special cases) (ef Q), sub-exponential (ef. Q) and algebraic (cf. ([^) tails. 

Finally, we assume that the linear functional 6 is “sparse” in the sense that most of its entries 
have small magnitudes. This is a plausible assumption in real applications such as portfolio 
selection B2l . Bl3l . LDA [[191 . optimal estimation and predietion for time series [|44l . For 
instanee, to obtain stable portfolio optimization and faeilitate transaction costs for a large number 
of assets, considered sparse portfolio by adding an penalty in the objeetive funetion. In 
LDA, classification based on the sparse Bayes direetion has been studied in ||T91 . Our estimator 
Q is also closely related to the Dantzig selector for the linear regression model iHTl . Let 
y = X0 -f e, where = ?7,“^/^(xi, • • ■ ,x„) is the design matrix and e ~ A^(0,ld„xn)- The 
Dantzig seleetor is defined as the solution of 

minimizer;eiRp|^|i subject to |X^(X ?7 — y)|oo < A. (9) 

Since X^(X ?7 —y) = 5„77 —X^y, (j^ is equivalent to (|^ with lo = X^y. When the dimension p 
is large, it is reasonable to assume that prediction using a small number of predictors is desirable 
for practical modeling, statistical analysis and interpretation. 
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II. Main results 


In this section, we shall first present the rate of convergence of Q for the linear functional 
0 = The convergence rate is characterized under various vector norms for linear processes 

with a broad range of dependence levels and tail conditions. Then, we present two applications 
to derive the ratio consistency of direct estimation for sparse Markowitz portfolio allocation and 
optimal linear prediction. 

We now introduce some notation. Denote by C*, C", Ci, ( 72 , • ■ ■ positive constants (independent 
of the sample size n and the dimension p), whose values may vary from place to place. Let 
a be a vector in M be a p x p matrix, X be a random variable and q > 1. Write |a|g = 
|a| = |a |2 and |a|oo = maxi<j<p |aj |. Let p(M) = max{|Ma| : |a| = 1} be 
the spectral norm of M, \M\li = maxi<fc<p X]j=i \M\f = |M|i = 

Y7jk=i We write X e if ||X||g = (E|X|'^)^/^ < cx). Denote ||X|| = ||X|| 2 . For two 
sequences of quantities a := an,p and b := bn,p, we use a<b, a>^b, a^^b and a b to 
denote a < Cib, ( 72 & < a < C^b, a/b —)■ 1 and a/b —)■ 0 as p, n —)■ oo, respectively. We use 
a Ab = min(a, b), a \/ b = max(a, b), a+ = max(a, 0) and sign(a) = 1, 0, —1 if a > 0, a = 0 
and a < 0, respectively. For a set S, |5| is the cardinality of S. Throughout the paper, we use 
f3' = min(2/3- 1 , 1 / 2 ). 

A. Convergence rates for estimating linear functionals 

Without loss of generality, we assume n = 0. We shall use the smallness measure 

p 

D{u) = Au),u > 0, 

i=i 

to quantify the size of 6. Let 0 < r < 1 and 

Mp) = < 77 G : max |p,| < z/, \r]jf < Mp I , 

[ U J 

which contains approximately sparse vectors in the strong £^-ball. Here, z/ is a constant inde¬ 
pendent of p and we allow Mp to grow with p. If 0 G Qrio^Mp), then D{u) < CrFMpU^~''. 
Suppose that rb is the rate of b for estimating b such that 

P(|b - b|oo > Cbn) < 2 p"'^'’ ( 10 ) 
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for some constants Cfe, C;, > 0. If b is observed, we can take ri, = 0 and Cb = oo. 


Theorem II.l (Sub-Gaussian). Let (xj) be the linear process defined in ^ that satisfies (|^ and 
0. Let Jn,pp = (log(p)/n)^/^ (log(p)/n)^/2 \/ (log(p)/n2/3"^), and \og{p)/n^f^~^, for fi > 1, 


1 > /? > 3/4 and 3/4>/3>l/2, respectively. Let ct and Cb be constants defined in (10). Then 
there exist constants Ci > 1, G 2 >0 only depending on (3, Cq in 0 and in (|^, such that 
for A > CbTb + Ci\6\iJn,pp, with probability at least 1 — 2p~'"^ — 2p~^'^ we have 


\e - < [6D(5|S-i|iiA)]»(2|S-iUiA) 




( 11 ) 


for 1 < w < 00 . In particular, for 6 G Mp), with the choice A = CbTb + CiC Mp.Jn,p,p, 
we have 


P (10 - < GaMp™ 


i-j 


^ |l 1 (-^p'-ln,p,/9 4“ 'f’b 

> 1 — 2p~^’’ — 2p~^^ 


( 12 ) 


where the constant C 3 depends only on r, z/, w and Ci. 


We remark that the bound ( [IT] ) is homogeneous in If we rescale by f > 0, then the 
right hand side of ( [TT] ) scales by the same factor t. Note that Theorem II. 1| is non-asymptotic 
and the convergence rates ( [TT] ) and (12) hold with probability tending to one polynomially fast 
in p. Consider the case where fi > 1 (short-range dependent case), 6 G Qr{^,Mp) with r = 0 
(true sparsity in 6), Vb = 0 (h is known). Let 0 = (log(p)/n)^/^ and assume log(p)/n —)■ 0. For 
the choice of A = CMpf for some large enough constant C, the asymptotic rate of convergence 
( |T^ can be simplified as 

|0-0U = Op(Mp'+'/-|E-'Ui0), u;G [l,cx)]. (13) 

The above bound is generally un-improvable. Consider the special case in which x* j, f, j G Z, 
are i.i.d. A^(0,1) . Then E = Idp. Let e = (1, 0,..., 0)"'^ and Sn = n~^ = c0, 

where the constant c > a/ 2. By elementary calculations, we have with probability going to 1 


that \{Sn — S)e|oo < A. Note that (13) gives \0 — 0|^ = Op(0). Next we shall argue that. 


— 0|oo > 0) —)■ 1 as p A n —)■ cx). 


(14) 
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To this end, let 0 = (1 — 0, 0,... ,0)^. Sinee an — 1 = and c > \pi, we have 

probability going to one that |(1 — (j))an — 1| < A. Then P(|^n0 — e|oo < A) —)■ 1. Note that if 
\SnO — e|oo < A, then |0|i > \0\i. Henee \0 — 0\oc > |1 — 0i| >0. Then ( 14) holds. 


Theorem II.2 (Exponential-type). Assume (|^ and Let (3' = min(2/3 — 1,1/2) and 

Jn,p,p,a = n"^'(logp)^"+^. (15) 


Let Cb and Cb be constants defined in 
depending on a, j3, Co in (|^, and 
that with probability at least 1 — 2p~'"’> - 


Then there exist constants Ci,C 2 ,C '3 > 0 only 
in such that for A > CbVb + Ci\0\iJn,p,i3,a we have 
C 2 P~^^, 0 satisfies (11) with Jn,pp replaced by Jn,pp,a- 


Theorem IL3 (Polynomial). Assume Q and (|^ with q > L (i) Let (3 > 1 — 1/g. Then there exists 
positive constants Ci,..., (^4 such that, for any e: > 0 and A > A^ := + 

Ci{n~^ logp)^/^), with probability at least p^ := 1 — 2p~^'> —C 2 {e'^^‘^ +p~^^), ( |i7[ ) holds, (ii) Let 
1 — 1/g > {3 > 1/2. Then the conclusion in (i) holds with A > A/ := A^ -f- \0\ie~^/‘^p^/‘^rd~‘^^. 


Take e = 1 and thus Ai = + C^{n~^ logp)^/^), A^ = Ai + 

Let Ai = Ai if /3 > 1 — 1/g, and Ai = A* if 1 — 1/g > /? > 1/2. For 6 G Griy, Mp), Theorem 
|II. 3 1 implies the rate of eonvergenee 

|0 - = Op(Mp“Aj““), we [1,00]. 

The C norm rates of eonvergenee are summarized in Table |l| whieh shows several interesting 
features. First, looking vertieally for eaeh eolumn in Table |I| we see that the rates of eonvergenee 
slow down from SRD to LRD. So the effeetive sample size shrinks as dependenee beeomes 
stronger. Seeond, horizontal trend of Table |I] shows that the rates of eonvergenee beeomes 
worse from sub-Gaussian to exponential-type to polynomial moment eonditions. Third, if the 
innovations have polynomial moment, then the rate of eonvergenee is determined by a sub- 
Gaussian term and a polynomial algebraie tail term. 


Remark 1. The boundary oases [3 = 1 and 3/4 for Theorem II. 1 oan also be dealt with. Assume 
sub-Gaussian innovations. Following the argument in the latter theorem, the eorresponding C 
norm rates of eonvergenee in Table is with u = {u2 log^ n) (resp. u = 

(miA/ log n) V U 2 ) for /3 = 1 (resp. f3 = 3/4). 
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TABLE I 


Summary: the norm rates of convergence (in probability) of 0 under various dependence levels and 
TAIL CONDITIONS ON THE LINEAR PROCESS Xi = DEPENDENCE INDEX /3 £ (1, oo], /3 £ (3/4, 1) AND 

/3 £ (1/2, 3/4) CORRESPOND TO THE SRD (INCLUDING LED.), WEAK AND STRONG LRD CASES. SUB-GAUSSIAN 
(INCLUDING BOUNDED AND GAUSSIAN), EXPONENTIAL AND POLYNOMIAL CORRESPOND TO THE MOMENT/TAIL 
CONDITIONS ON WE LIST THE RATES FOR 9 £ Gr{v, Mp) UNDER THE CONDITIONS THAT T;, = 0 (b IS OBSERVED) AND 
^ Ml = (logp/n)^''^, M 2 = (logp/n^'^"^), M 3 = (logp)^“+^/n^/^, M 4 = (logp)^“+^/n^^“\ 

Ms = AND Me = 



Sub-Gaussian 

Exponential 


Polynomial 

/3 e (i,oo] 

3-^ l_r 

Mp^ u{ ^ 

Mp ^ Mg ^ 

/3 e (i,oo) 

Mp^ (uiVm5)^“5 

/3g (3/4,1) 

Mp^ {uiVU2f~^ 

Mp^ Mg” 5 

/3 e [1 - 1/9,1] 

MP~ {ui V U5f~^ 

(1/2,3/4) 

!EEr i_i: 

Mp^ Y ' 


/?£ (1/2,1-l/q) 

Mp^ (mi V Mg V Mg)^”® 


B. Sparse Markowitz portfolio allocation 

In Markowitz portfolio (MP) allocation [[Tl, the risk of a portfolio of p assets x = (Xi, ■ • • , XpY 
is quantified by the varianee of their linear eombinations. The optimal portfolio risk for a given 
amount of expeeted return m is formulated as 

minimizewGRp Var(w^x) subjeet to E(w^x) = m. (16) 


If X has mean p, and eovarianee matrix S, then the MP is equivalent to ([^ and the optimal 
allocation weights are w* = mT,~^p). For a large number of assets, [|43]| showed that 
the efficient frontier of the MP problem cannot be consistently estimated using the empirieal 
version and the risk is underestimated. Various regularization proeedures have been proposed 
[|43ll . [l42ll . Let Ap = p^'E~^p = p^6, where 6 = 'E~^p. Then 

* /I J 7-,/ 

w = -—0 and R(w ) = . 

Ap ' ' Ap 

Note that the MP risk funetion R{w) = w^Sw depends on the distribution of x only through 
the covariance matrix. Let w be an estimator of w*. We wish to find a w such that i?(w) is 
close to R{w*). 


Definition ILl. We say that w is ratio consistent if /?(w)/i?(w*) —>-p 1. 
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We impose the following assumptions. 

MP 1 ' |w*|o < s and |w*|oo < for some eonstant > 0. 

MP 2: Let r 2 (resp. r^) be the rate of eonvergenee of Sn = n~^ and x: 

- E|oo = Op(r 2 ), |x-^loo = Op(r3). 


MP 3: For some eonstants Ki,K 2 ,C > 0, |/ij| < Ki,ajj < K 2 , and R{w*) < C. 

MP 4: There exists an estimator 6 satisfying \0 — 6\i = Op(ri). 

MP 1 is a sparsity eondition on the oraele portfolio weights. MP 2 is a high-level assumption 
on the eoneentration of maximum norms on sample mean and eovarianees about their expeeta- 
tions, whieh ean be fulfilled for a broad range of moment and dependenee eonditions on x*. MP 
3 is a regularity eondition exeluding assets with extremely large mean returns and unbounded 
risks. MP 4 requires the existenee of an estimator for the linear funetional 6, whieh ean be 


verified by our main result in Seetion II-A under mild eonditions. As a natural eondition to get 
eonsisteney, we assume max(ri,r 2 ,ra) = o(l) as n,p —)■ cx). 

The intuition of the proposed estimator for w* is explained as follows. Sinee w* is sparse, so 
is 0 and therefore we ean seek a sparse estimator 0 sueh that \0 ~ 0\i —)-p 0. Then, we expeet 


pi ' 0 - x'^0| < \^l\oo\0 - 0\l + |X - A^lool^ll 0 


SO that |w — w*| is small and R{w) is elose to R(w*). Now, we deseribe our method, whieh 
eontains two steps. First, we estimate 0 by 

minimizer;eMp|r/|i subjeet to -x|oo<A. (17) 

Denote the solution by 0. Then, we eompute Ap„ = x^0 and w = m0/Ap^n- 


Proposition II.4. Fix the mean return level m and assume MP 1-4. In ( (77| ) choose A > 
C{Aps{r 2 + r‘^)-\-r^), where C > D is a sufficiently large constant. If sri +ApS^{r 2 + r‘^) = o(l), 
then w is ratio consistent. 


Remark 2. In Proposition II.4 sri -f ApS^(r 2 -f r|) = o(l) is a natural eondition sinee ri and 
r 2 eontrol the error in estimating 0 and S, while s and Ap refleet the diffleulty of the high¬ 
dimensional problem. In partieular, Ap eannot diverge too fast in order to get ratio eonsisteney in 
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the risk: if Ap diverges faster, then R(w*) —)■ 0 so quiekly that makes any estimation proeedure 
inferior to the aeeurate oraele. Therefore, eharaeterization of the optimality of our proeedure 
depends on the moment and the dependenee eonditions on x* through the rates ri,r 2 , and r^. 
For example, applying Proposition |IL4| to SRD time series ((3 > 1) with sub-Gaussian tails, we 
may take r 2 = = ^y\ogp/n and ri = \J\ogp/n. Then, a suffieient eondition for 

ratio eonsisteney is + Ap)s^^/\ogpJn = o(l). 


C. Sparse full-sample optimal linear prediction 

In this seetion we eonsider the optimal linear predietion for a univariate time series. Let S,i be 
i.i.d. mean-zero random variables with unit varianee and 

OO 

E ( 18 ) 

m=0 

be a mean-zero univariate linear proeess, where loml < ^* 0(1 V m) ^ for m > 0 and [3 > 1/2. 
Denote x = (Xi, ■ ■ ■ , Xn)~^ and T = E(xx^) as the auto-eovarianee matrix of x. If x is viewed 
as an n-dimensional observation, then T is the eovarianee matrix of x. The optimal one- 
step linear predietor for X^^i based on the past sample is where the 

eoeffieient veetor 6 = ( 6 * 1 , •• • , is determined by the Yule-Walker equation 


e = r-^ (19) 

and 7 is the shifted first row of T. Let % = 3XtXt+\s\ be the sample auto-eovarianees 

and 


n{x) 


1 , if |a;| < 1 , 

< ^(|a;|), if 1 < |a;| < c, 
0 , if |a;| > c, 


where the funetion g{-) satisfying \g{x)\ < 1 , and c > 1 is a eonstant. Il4^ proposed the flat-top 
tapered auto-eovarianee matrix estimator 


= {%-k\)i<j,k<n, where 7 ^ = n{\s\/l)%, |s| < n. 
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It has been shown in P7l that optimal linear predietion based on full time series sample ean 
be aehieved by 

0 = f-'7n- (20) 


If the best linear predietor ean be approximated by a sparse linear eombination in the full sample, 
[|44| proposed a sparse full-sample optimal (SFSO) linear predietor 6 that solves 


minimizer/eRp|^|i subjeet to - 7 „|oo < A, 


( 21 ) 


whieh has a better eonvergenee rate than 0 in (20). Let 70 = The risk funetion i?(w) = 


E(w^x —X„+i)^ = w^Tw — 2 w ^7 + 7 o is a natural eriterion to assess the quality of estimators. 


Note that the oraele risk for (19) is R{0) = 70 — 7 ^r “^7 = 'yQ — O^VO. It was established in 
that the SFSO is eonsistent for estimating the best sparse linear predietor in the £^-norm. Here, 
we use the ratio eonsisteney eriterion to assess the SFSO eompared with the oraele predietor 
We shall make the following assumptions. 

OLP 1: \6\o < s and |0|oo < Cq. 

OLP 2: For some eonstants Ki,C > 0, |r|oo < Ki and R{0) > C. 

Assumptions OLP 1-2 are parallel to MPl, 3. The oraele risk R{0) is lower bounded to rule 
out the unpraetieal eases where the predietion ean be perfeetly done using past observations. 


Proposition II.5. Let (Xi) be a linear process defined in (18) and ||^i||g < 00 for some g > 4. 
Let r 4 = tq + rs, where tq = or iffi>lorl>fi> 1/2 and = (log/)n“^'||.^o||q. 


where we recall fi' = min(2/? — 1,1/2). Let A > C{\6\i + l)r 4 in (21). Then we have 

|0-6/|i = Op(Zl(5A|r-i|ii)). 


( 22 ) 


Assume further OLP 1-2. If D{5X\r ^|/,i) = o(l), then the SFSO linear predictor is ratio 
consistent. 

Remark 3. In [l47ll . the rate of eonvergenee \0 — 0\2 = Of>{ln~^R + l7d)’ where I 

is the bandwidth of the flap-top matrix taper. Therefore, 6 is not eonsistent in the long-range 
dependenee setting. Finite sample performanees based on the relative risk are assessed in Seetion 
III-A[ On the other hand, the rate obtained in (22) is sharper than [l44l Theorem 2] if has 


12 










a polynomial tail. This is due to the tighter eoneentration inequality for |f„ — r|oo with the 
auto-eovarianee struetures (Lemma |VL5[ ). 

III. Simulation studies 

Here we shall study how the dependenee, dimension and the innovation moment eondition 
affeet the finite sample performanee of the linear funetional estimate Q. We simulate a variety 
of time series of length n = 100, 200 while fixing the dimension p = 100. We eonsider three 
dependenee levels: f5 = 2, 0.8, 0.6, eorresponding to the SRD {(3 > 1), the weak LRD (1 > 

> 3/4) and the strong LRD (3/4 > j3 > 1/2) proeesses. The eoeffieient matriees Am are 
formed by i.i.d. Gaussian random entries N{Q,p~^) multiplied by the deeay rates 
and respeetively. Then 80% randomly seleeted entries of Am are further set to zero. Four 

types of i.i.d. innovations are ineluded: uniform [—3^^^, 3^/^], standard normal, standardized 
double-exponential and Student-fs. 

A data splitting proeedure is used to seleet the optimal tuning parameters. To preserve the 
temporal dependenee, we split the data into two halves: the first half is used for estimation and 
the seeond half is used for testing. In the linear funetional 6 = b is ehosen sueh that the 

eoeffieient veetor 6 has 80% zeros and 20% i.i.d. non-zeros. Eaeh simulation setup is repeated 
for 100 times and we report the averaged performanee for the “bloek data-splitting” and the 
“oraele” estimate. Here, the bloek data-splitting estimate refers to the validation proeedure on 
the seeond half testing data from the data splitting proeedure and the oraele estimate refers to the 
validation proeedure using the true eovarianee matrix. Validation proeedures are used to seleet 
the tuning parameter A that minimize the loss |Stest^train(A) — b| and |E0train(A) — b| for the 
data-adaptive estimate and the oraele estimate respeetively. Results are shown in Tables 
and Figures 

A number of eonelusions ean be drawn from the simulation results. First, we look at the 
seleeted tuning parameters by the bloek data-splitting proeedure. Tables [n| and [ni| suggest that the 
optimal tuning parameters are data-adaptive (w.r.t. the dependenee level, tail eondition and sample 
size) in the sense that they are getting eloser to the optimal eonstraint parameters validated by 
the oraele as the sample size inereases. In partieular, for eaeh setup (n,p), the optimal eonstraint 
parameter beeomes larger, as (i) the dependenee gets stronger, (ii) the tail gets thieker, and (iii) 
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the sample size deereases. This is eonsistent with our theoretieal analysis in Seetion see 
Theorem IIL1HII.3[ 


TABLE II 

The optimal constraint parameter A selected by the oracle and the block data-splitting procedure in 
THE DANTZIG selector TYPE ESTIMATE FOR STANDARD DEVIATIONS ARE SHOWN IN THE PARENTHESES, p = 100 

AND n = 100. 




bounded 

Gaussian 

double-exp 

Student-1 



0.1221 

0.1289 

0.1225 

0.1340 

/3 = 2 

oracle 

(0.0236) 

(0.0244) 

(0.0241) 

(0.0245) 

block 

0.1939 

0.1961 

0.1842 

0.2291 




(0.0533) 

(0.0540) 

(0.0490) 

(0.0808) 



0.2419 

0.2470 

0.2434 

0.2549 

00 

o 

II 

oracle 

(0.0424) 

(0.0446) 

(0.0469) 

(0.0475) 


0.4227 

0.4655 

0.4188 

0.4806 


block 


(0.1216) 

(0.1424) 

(0.1267) 

(0.1543) 

0.4835 

0.4817 

0.4855 

0.4875 

oracle 

(0.0798) 

(0.0868) 

(0.0840) 

(0.0784) 

P = 0.6 

0.9147 

0.9789 

0.9327 

0.9936 

block 

(0.2640) 

(0.2897) 

(0.2906) 

(0.2930) 


Second, from Figure and Figure it is clear that the Student-t(3) innovations, which have 
the infinite forth moment, uniformly perform worse than the innovations with bounded support, 
Gaussian tail and exponential tail. This empirically justifies our theoretical results regarding the 
moment/tail condition; see the asymptotic rates of convergence in Section |I^ Moreover, similarly 
as the optimal tuning parameter, the estimation error also increases, as (i) the dependence gets 
stronger and (ii) the sample size decreases. In addition, the effect of the innovation distribution 
becomes relatively smaller when dependence strength increases. 


A. Optimal linear prediction 

We verify the ratio consistency of the sparse full sample optimal linear predictor in Section 
II-C| on finite samples. Partially following the setup in [l44ll . we simulate stationary Gaussian 
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TABLE III 


The optimal constraint parameter A selected by the oracle and the block data-splitting procedure in 
THE DANTZIG selector TYPE ESTIMATE FOR E“^b. STANDARD DEVIATIONS ARE SHOWN IN THE PARENTHESES, p = 100 

AND n = 200. 




bounded 

Gaussian 

double-exp 

Student-f 



0.0763 

0.0758 

0.0797 

0.0875 

P = 2 

oracle 

(0.0150) 

(0.0138) 

(0.0156) 

(0.0170) 

block 

0.1062 

0.1032 

0.1109 

0.1261 




(0.0211) 

(0.0236) 

(0.0260) 

(0.0386) 



0.1555 

0.1544 

0.1555 

0.1627 

00 

o 

II 

oracle 

(0.0266) 

(0.0253) 

(0.0275) 

(0.0292) 

block 

0.2485 

0.2473 

0.2554 

0.2594 




(0.0573) 

(0.0515) 

(0.0590) 

(0.0624) 



0.3364 

0.3307 

0.3349 

0.3353 

p = 0.6 

oracle 

(0.0527) 

(0.0518) 

(0.0540) 

(0.0466) 

block 

0.5673 

0.5472 

0.5743 

0.5544 




(0.1193) 

(0.1159) 

(0.1207) 

(0.1245) 





(a) P = 2. (b) 13 = 0.8. (c) P = 0.6. 

Fig. 1. Error curves under the loss for the linear statistics estimate for p = 100 and n = 100. x-axis is the threshold, j/-axis 
is the quadratic error, ‘ada’ means adaptive block data-splitting procedure and ‘ore’ means the oracle procedure, ‘bd’, ‘gs’, ‘de’ 
and ‘st’ denote bounded, Gaussian, double-exponential and Student-f distributions, respectively. 
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(a) 13 = 2. (b) /3 = 0.8. (c) /3 = 0.6. 

Fig. 2. Error curves under the loss for the linear statistics estimate for p = 100 and n = 200. i-axis is the threshold, y-axis 
is the quadratic error, ‘ada’ means adaptive block data-splitting procedure and ‘ore’ means the oracle procedure, ‘hd’, ‘gs’, ‘de’ 
and ‘st’ denote hounded, Gaussian, douhle-exponential and Student-f distrihutions, respectively. 


time series from two models 


1) AR(14) model: Xi = where di = —0.3, 6*3 = 0.7, 6*14 = —0.2, and the 

rest of 9j = 0. The errors e* are i.i.d. A^(0,1). 

2) AR(1) model: Xi = OXi^i + e*, where 6 = —0.5 and e* are i.i.d. A^(0,1). 

We take the following eompetitors of the SFSO: the two versions of ridge eorreeted shrinkage 
predietors (FSO-Ridge, FSO-Ridge-Thr) in ll44l and the thresholding (FSO-Th-Raw, FSO-Th- 
Thr), shrinkage to a positive definite matrix (FSO-PD-Raw, FSO-PD-Thr) and white noise (FSO- 
WN-Raw, FSO-WN-Thr) predietors in [|47l . We also run the R funetion ar () as the benehmark 
with the default parameter that uses the Yule-Walker solution with order seleetion by the AIC. We 
fix the tuning parameter A = ■\/log(n)/n for the SFSO. We try two sample sizes n = 200, 500. 
We follow the empirieal rule for ehoosing the bandwidth parameter I for all eompetitors in [|47l . 
The performanee of those estimators are assessed by the estimated relative risks. All numbers 
in Table IV and |V] are reported by averaging 1000 simulation times. In both AR(1) and AR(I4) 
models, our simulation shows that the SFSO is very elose to the oraele risk. This eonfirms our 
theoretieal findings in Proposition |II.5[ On the other hand, the relative risk for shrinkage based 
predietors tend to perform worse relatively to the oraele. It also is observed that the AR and 
SFSO predietors are eomparably the best among all predietors eonsidered here. If we look at the 
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estimation errors, there is a sizable improvement for the SFSO over the AR due to sparsity; e.f. 
Il44l . The improved performanee for SFSO on the AR(14) model is larger than other methods 
(exeept AR) on the AR(1) model, whieh is explained by the sparsity strueture in the oraele linear 
predietor. 


TABLE IV 

Estimated relative risks for the AR(14) models for n = 200 and n = 500. The oracle risk is one. Standard 
ERRORS are shown IN PARENTHESES. ALL METHOD SYMBOLS ARE CONSISTENT WITH ||44]| . 


AR 

SESO 

FSO-Ridge 

ESO-Ridge-Thr 

ESO-Th-Raw 

ESO-Th-Shr 

ESO-PD-Raw 

ESO-PD-Shr 

FSO-WN-Raw 

FSO-WN-Shr 


n = 200 
1.1168 (0.0535) 
1.1173 (0.0851) 
1.3443 (0.2433) 
1.4076 (0.3525) 
2.4623 (3.3663) 
1.6530 (0.8478) 
1.4930 (0.3388) 
1.4584 (0.3127) 
2.1798 (2.9911) 
1.6859 (1.2386) 


n = 500 
1.0336 (0.0159) 
1.0455 (0.0256) 
1.2897 (0.4119) 
1.3913 (0.8883) 
13.4350 (74.0697) 
3.3540 (9.6394) 
1.4475 (0.5842) 
1.3361 (0.2087) 
10.7390 (62.8709) 
4.1574 (15.2984) 


IV. Real data analysis 
A. Task classification for fMRI data 

In this section, we apply the methods in Section to a real data for the cognitive states 
classification using the fMRI data. This publicly available dataset is called StarPlus. In this 
fMRI study, during the first four seconds, a subject sees a picture such as ±, i.e. the symbol 
stimulus. Then after another four seconds for a blank screen, the subject is presented a sentence 
like “The plus sign is above on the star sign.”, i.e. the semantic stimulus, which also lasts for 
four seconds, followed by an additional four blank seconds. One Picture/Sentence switch is 
called a trial and 20 such trials are repeated in the study. In each trial, the first eight seconds are 
considered as the “Picture” (abbr. “P”) state and the last eight seconds belong to the “Sentence” 
(abbr. “S”) state. Sampling rate of the fMRI image slides is 2Hz and each slide is a 2-D image 
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TABLE V 


Estimated relative risks eor the AR(1) models for n = 200 and n = 500. Standard errors are shown in 
PARENTHESES. STANDARD ERRORS ARE SHOWN IN PARENTHESES. THE ORACLE RISK IS ONE. ALL METHOD SYMBOLS ARE 

CONSISTENT WITH ll44l . 


AR 

SFSO 

ESO-Ridge 

FSO-Ridge-Thr 

FSO-Th-Raw 

FSO-Th-Shr 

FSO-PD-Raw 

FSO-PD-Shr 

FSO-WN-Raw 

FSO-WN-Shr 


n — 200 
1.0171 (0.0270) 
1.0310 (0.0274) 
1.0314 (0.0188) 
1.0530 (0.0383) 
1.1055 (0.1520) 
1.0984 (0.1294) 
1.0367 (0.0224) 
1.0310 (0.0187) 
1.0694 (0.0608) 
1.0645 (0.0519) 


n = 500 
1.0062 (0.0108) 
1.0120 (0.0104) 
1.0128 (0.0103) 
1.0155 (0.0182) 
1.0161 (0.0232) 
1.0161 (0.0232) 
1.0138 (0.0109) 
1.0122 (0.0088) 
1.0161 (0.0232) 
1.0161 (0.0232) 


containing seven anatomieally defined Regions of Interests (ROIs)Q In this data analysis, we 
use four ROI^ and eaeh ROI may have a varying number of voxels (i.e. the 3-D pixels) for 
different subjeets. The four ROIs eontain 728-1120 voxels in total, depending on the subjeet. 
Therefore, for eaeh subjeet, we have two multi-ehannel time-eourse data matriees: one has 320 
time points with “S” state and the other has 320 time points with “P” state, both having the 
dimension p equal to the number of voxels in that subjeet. Therefore, this is a high-dimensional 
time series dataset {p > n). We assume that the overall time-eourse data are eovarianee stationary 
and standardize the data to unit diagonal entries in the eovarianee matrix. The goal of this study 
is to classify the state of subjeet (“P” and “S”) based on the past fMRI signals. 

The elassifier eonsidered here is the regularized linear diseriminant analysis (RLDA). Let S 
be the pooled eovarianee matrix for the two states, = n~^ Z^jgstate s *^he sample mean 
for the state s G {P, S}, and Ug be the number of time points in state s. The RLDA classifier 

‘The seven ROIs are: 'CALC', 'LDLPFC', 'LIPL', 'LIPS', 'LOPER', 'LT', 'LTRIA'. 

^The selected four ROIs used in our analysis are: CALC, LIPL, LIPS, LOPER. 
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TABLE VI 


Accuracy of the RLDA classifier ( |23l i, with different estimates of the pooled covariance matrix E (with 

THRESHOLDING), ITS INVERSE E“^ (GRAPHICAL LASSO), ITS LINEAR FUNCTIONAL E“^(/ip — /Xj) (|^, AND THE GNB 
CLASSIFIER. Four ROIS - CALC, LIPL, LIPS, LOPER - ARE USED IN THE “PICTURE/SENTENCE” DATASET. 


Subject 

# Voxels 

Thresholded E 

Graphical Lasso E ^ 

Linear functional 

GNB 

04799 

846 

85% 

90% 

95% 

80% 

04820 

728 

95% 

100% 

95% 

95% 

04847 

855 

90% 

90% 

95% 

85% 

05675 

1120 

95% 

95% 

100% 

95% 

05680 

1051 

90% 

85% 

85% 

70% 

05710 

810 

95% 

95% 

100% 

90% 

Average 

901.67 

91.67% 

92.50% 

95.00% 

85.83% 

Std 

150.87 

4.08% 

5.24% 

5.48% 

9.70% 


associates a new observation z to the label s G {P, S} aeeording to the Bayes rule 

P, if — (z —+ log(ns/np) < 0 


s = 


(23) 


S, otherwise 

where fi = and b = where is the mean for the group s G {P, S}. Note 


that (23) is also equivalent to maximizing the seore funetion 

seore(s) = -^(z - - ^l^) + \og(ns/n), n = rip + ris; 


i.e. s = argmaXgg|p g|Seore(s). Clearly, and S are unknown and they need to be estimated 
from the training data. 

We first perform an exploratory data analysis to evaluate the suitability of our method. As is 
widely reeognized in the neuroseienee literature, sparsity is an important feature for the high- 
solution imaging data. It is well grounded to believe that and E“^b are both sparse (see, for 
example, [l48ll . [|49l l. In addition, we plot the auto-eovarianee funetions (aef) for some voxels. 
Since S and P states have the bloek design, we eoneatenate the blocks with the same label S 
and P along the time index and make the sample aef plots for eaeh of the two states. Figure]^ 
shows that some voxels exhibit eertain long-memory feature. It has been well-understood that 
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the power speetral density for the fMRI signals has the “power law” property, suggesting the 
long-memory behavior of the fMRI time series; see e.g. liSOl . [fSTTl . Moreover, it is studied in the 
fMRI literature that the signals reveal light-tail property, (e.g., they are shown to be sub-Gaussian 
by [|5T]| '). We further make QQ-norm plots for some voxels (see Figure [^, whieh suggests that. 


the seenario falls into the eonsideration of Theorem II. 1 whieh allows an ultra high dimension 
p. In our data analysis, the largest number of voxels in all four ROIs is p = 1120, while the 


number of time points is n = 320 and Theorem II. 1 is apparently suitable. 


QQ Plot of Sample Datft versus Standard Normel 
4 r 



Stan dard N ottnal Q uantll es 


QO Plot of Sample Data versus Standard Normal 
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QQ Plot of Sample Data versus Standard Normal 
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QQ Plot of Sample Data vers us Standard Noimal 



\ -2 0 2 4 
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Standard Normal Quantiles 
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QQ Plot of Sample Dataversus Standard Normal 
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QQ Plot of Sample Dataversus Standard Normal 



QQ Plot of Sample Dataversus Standard Normal 
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Fig. 3. QQ-norm plots of 9 voxels. 

To perform the elassifieation, we use the sample mean estimate fi^ for pi^. Sinee this fMRI 
study has a bloek design meaning that eaeh state lasts for eight eonseeutive seeonds, we average 
the testing data in eight-seeond windows as new observations. In our experiment, we take six 
subjeet^ and train an RLDA for eaeh subjeet. Parameter tuning is performed by the same data 

^The six subjects are: 04799, 04820, 04847, 05675, 05710 and 05680. 
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Fig. 4. Sample plots for the time series and auto-covariance function of four voxels of the subject 05680. The first and last two 
rows are from the training data for S and P, respectively. 


splitting procedure used in our simulation studies in Section]^ the first 10 trials used as training 
dataset (320 time points) and the second 10 trials (320 time points) used as testing dataset. We 
compare the RLDA with the thresholded sample covariance matrix estimate, precision matrix 
by the graphical Lasso estimate and linear functional estimate Q, all plugged into ( [23| ). Tuning 
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parameters are seleeted by minimizing the Hamming error on the testing dataset. We also eompare 
with the performanees of the Gaussian Naive Bayes elassifier (GNB)|^The GNB have the same 


deeision rule (23) with differenee that the diagonal matrix of the sample eovarianees is used 
to estimate S. Performanees of all elassifiers are assessed by the aeeuraey, whieh are shown in 
Table 


There are interesting observations we ean draw from Table VI First, we see that, in general, 
the RLDA elassifiers aehieve higher aeeuraey than the GNB elassifier. Speeifieally, aeeuraey of 
the RLDA with the three estimates is: (91.67±4.08)% for RLDA with the thresholded estimate, 
(92.50 ± 5.24)% for RLDA with the graphieal Lasso estimate and (95.00 ± 5.48)% for RLDA 
with linear funetional estimate. Aeeuraey of the GNB is (85.83±9.70)%. The differenee is likely 
to be explained by the faet that the GBN assumes the independenee strueture on the eovarianee 
matrix S, whieh is very demanding and potentially ean eause serious misspeeifioation problems, 
as indieated by the lowest aeeuraey in the elassifieation task. By eontrast, the RLDA with the 
three regularized estimates on or is more flexible and it adaptively balanees between 
the bias and varianee in the estimation. Seeond, among the three RLDA elassifiers, we see that 
the RLDA with direet estimation of the Bayes rule direetion has the highest aeeuraey. 


followed by RLDA with the graphieal Lasso estimate. As it has been shown in Seetion II-A 
that, rate of eonvergenee for direet estimation of S“^b ean be guaranteed, while it is unelear 
that whether the eonsisteney of estimating E or implies the same property of estimating 
S“^b with the natural plug-in estimates. In addition, from the seientifie viewpoint, it appears 
to be a meaningful assumption that effeetive predietion is based on a small number of voxels 
in the brain sinee different ROIs may eontrol different tasks and subjeets ean only perform one 
task at eaeh time point in the fMRI experiment. 


B. Markowitz portfolio allocation 

Here we apply the direet estimation for linear funetionals in high-dimensional MP alloeation. 
We use the daily value-weighted returns from January 2005 to Mareh 2015, for 100 portfolios 
formed on size and the ratio of market equity to book equity, i.e. the interseetions of 10 market 

"'The LDA is not applicable here since the sample covariance matrix Sn on the training data is singular. 
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equity portfolios and 10 of the ratio of book-to-market ratio portfolios. These portfolios are made 
using the Center for Research in Security Prices (CRSP) database obtained from the Kenneth 
French data library. 

The expected return is fixed to m = 1. At the end of each month from July, 2005 to 
March, 2015, the portfolios are invested and held for one month with rebalancing. The portfolio 
allocation weights are estimated using the past 6-month data. Here p = 100. The sample size 
n = 129 approximately as the number of trading days varies slightly from month to month. 
Four estimators are considered: (1) the linear functional estimator with Ai; (2) plug-in estimator 
using the portfolio daily return mean and the sample covariance matrix from the past data. (We 
use the Moore-Penrose generalized inverse when the sample covariance matrix is singular;) (3) 
plug-in estimator using the portfolio daily return means and the graphical lasso precision matrix 
estimator from the past data; (4) the ridge shrinkage estimator of the covariance matrix by 

minimizeweKp -f AIdp)w subject to = 1. 

The tuning parameters are selected by the following data-driven steps. We partition the data 
into K = 17 consecutive periods. Each period consists of ntrain = 125 daily returns as training 
data and ntest = 21 daily returns as testing data. The information ratio is computed as 

IR(A) = y - 

k=l (w^(A)^fc,testWfc(A)) ^ 

where Wfc(A) is the portfolio allocation weight computed using the kth period training data 
with parameter A; test ^^^test are the sample mean and sample covariance of the fcth 
period testing data. The parameters are selected to maximize the information ratio over a grid 
of [0,0.1], [0,0.2], and [0,2] for the linear functional optimization, graphical lasso and ridge 
shrinkage, respectively. 

The tuning parameters for the linear functional optimization, graphical lasso and ridge shrink¬ 
age are 0.03, 0.039 and 1.2, respectively. Means of the monthly return for the constructed asset 
portfolios are calculated to represent actual return levels. We also estimated the one-month risk 
w^Sone-monthW using the estimated weights and the sample covariance of the daily data of the 
next month. The graphical lasso with parameter 0.039 has mean return 1.62 and risk 3.02, both 
of which are lower than the other methods. To make the comparison easier, we present the 
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mean return and risk ealeulated with parameter 0.15. The result is shown in Table VII It is 
observed that the linear funetional estimator for the Markowitz portfolio alloeation performs the 
best among the four methods in terms of mean return and risk. The performanee of the ridge 
shrinkage is better than the plug-in estimator, but it is worse than the proposed linear funetional 
estimator. 


TABLE VII 

Estimated mean return and risk oe the Fama-French 100 porteolios analysis. 



Functional 

Plug-in 

Glasso 

Ridge 

Mean Return 

2.45 

2.00 

2.38 

2.37 

Risk 

3.96 

9.08 

4.17 

4.57 
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Supplemental material: proofs 


In this supplemental material, we prove the main results and teehnieal lemmas of the paper. 
Equation and referenee numbers in the supplemental materials eontinue from the main paper. 


V. Proofs 


A. Proof of Theorem 11.1 - 11.3 


Proof of Theorem II.l -II.3 relies on the following lemma. 


Lemma V.l. Let b be an estimator ofh and A> — S|oo + |b — b|oo- Then, 0 satisfies 

\SnO — b|oo < A. For the estimate 6 := 6{\), we have 

\e-e\^< [6D(5A|S-I|ii)]“ (2A|S-i|ii)'“» (24) 

for 1 < w < oo. 


Proof. Sinee 0 = S ^b, we have 

|^,6>-bU =|^n6>-S6/ + b-b|oo 

^\Sn — S|oo|^|l + |b — b|oo < A. 

Therefore, 6 is feasible for (j^ with sueh choiee of A and |0|i > \b\i. Then 
|E0-b|oo < |S0 - b|oo + |b - b|oo 

^ \Snb — b|oo + \Sn — S|oo|^|l + |b — b|oo 

< 2A. 

It follows that \6 — 6\oo < — 0)|oo < 2A|S“^|ii. Next, we bound \6 — 6\i. Let 

6 = 6 — 6 and u = |5|oo- For any constant a > 1, let further 5j(a) = 9jJ{\6j\ > au) — 9j and 
^|(a) = 5j — 5](a) for j = 1, • • ■ ,p. So <5 = <5^(a) + 5‘^{a) and 

|6>|i>|0|i = |5^(a) + 0|i + |52(a)|i 

— I^|l ~ + I^^(®)|l5 


1 








which implies that |^^(a)|i > l<5^(a)|i and |5|i < 2|5^(a)|i. Now, observe that 
l<5^(a)|i = ^ H 

3 

= > aw) 

3 

+ |6*j|I(|0j| < au) 

3 

< n I(|6'j| > (a — 1 )m) 

3 

+ < (a + l)w) 

3 

< {a — 1)~^D (^{a — l)u) + D (^{a + l)u). 

Taking a = 1.5, we obtain 


\e-e\i <QD (5A|s-dLi)- 


Now, (24) follows from the interpolation of norm by i°° and norms < |5|L 


□ 


Let Gi = {|b — b|oo < Cfer^} and G 2 = {|5„ — S|oo < G^Jn}, where Jn is a sequenee of 
real numbers. Then, P((j'i) > 1 — 2p~^^. In addition, under different dependenee and moment 
eonditions, we need to find a Jn sueh that P(G 2 ) ^ G 4 _p~^^ for some eonstants (^ 4 , C 5 > 0 
depending on G^. 

and the union bound, we have 

P(|^„ - S|oo > X*) 

< 2p^ max{exp(-C'a;^/L„,y 3 ), exp(-Ca;*/ Jn,p)} 

< 2p^ max{p“'"‘"3 ^ p-cc3 1 _ 2p-c'C'3+2^ 


Sub-Gaussian innovations. Let x* = C 3 max{(L„^^logp)^/^, J„_^logp}. By Lemma 


VL2 


where the last step follows from G 3 > max(l, 2(7“^). Therefore, we ean take G^ = 2 and 
C 5 = CCs — 2 > 0. By Lemma V.l on the event Gi fl G 2 , whieh oeeurs with probability at 
least 1 — 2 p~^>’ — Gip~^^, we have 

, 1 -^ 


\0 - 6>U < [QD (5A|S-dLi)] “ (2A|S-dLi) 


2 






Now, ( pTj ) is immediate. Note that ( 12) easily follows from ( [TT| ) in view of D{u) < MpU^~^ and 
|0|i < for 6 G Qr{i',Mp). 

Sub-exponential innovations. Let x* = Ci(logp)^“+^n“^'. By Lemma 


VL4 


we have 


P(|^n - S|oo > X^) 

< C2P^ logp) = 


Choose C3 = _ 2 > 0. Then, Theorem 


II.2 


follows from Lemma 


V.l 


Polynomial moment innovations. Suppose that ||^i,i||q < cxd. First, consider /3 e [1 — 


1/q, 00). By Lemma VL3 we have for all x > 0 

P(|^„ - E|oo > x) < ^ exp(-CW)]. 


(25) 


Using Xe = + Ci(log(p)/?7,)^/^ in the above inequality, we get P(|^n — S|oo > 

Xs) < where €2 = C and C3 = C'C\ — 2. For /3 e (1/2,1 — 1/g), by Lemma 

|VL31 we have for all x > 0 


P(|^„ - E|oo > x) < Cp^n^-^l^^x-^l^ + ^ exp(-C'W)]. (26) 

Using x^ = in the last inequality, we get P(|^„—E|oo > x^) < 

with C 2 = 2C'. 

□ 


B. Proofs of Results in Sections II-B and II-C 


Proof of Proposition II.4. 


By construction. 


i?(w) Ap0 E0 

i?(w*) ~ A2 

^ ' p,n 


fj:0/Ap 

(xT0/Ap)2 


3 

















V.l 


Note that Sn = Sn + Un where Un = — With our ehoiee of A, by Lemma 

and MP 1, 2, |0|i is bounded in probability by |0|i. By MP 1, 2, 3, and 4, we have 

E6-6^E6\ < \f {EO - Sn0)\ + \{o'^Sn-O^E)e\ 

< \f E{e-6)\ + \{e-ey^ei 

+2\f{Sr.-j:)o\ 

< |su(|0|i + |0|i)|0-0|i 

+2(|^n — S|oo + |x~ 


Be aware that ri depends on A. Sinee |0|i = O(Aps), we have 

ri\0\i + (rs + rl)\0\l 


Similarly, 




A. 


- 1 




A. 


=0 (sri + ApS^(r2 + rl)) . 


\0 X — < \0 (x —^)| + |(0 — 0)^ 

< \0\i\^ - ^J,\oo + \0 - 0\l\^J>\oo 

= Op(ApSr 3 + ri). 


Therefore, 


0^x 


A, 


ri 


= Op(sr3 + —). 

LAy) 


By MP 3, Ap > m?/C. If sri +ApS^(r 2 +r|) = o(l), then the result follows from the eontinuous 
mapping theorem. □ 

Proof of Proposition [//.Jl By the deeomposition in Theorem 2], we have 


iTn-Tloo < T + n max sh^l + max , 

l<s<[dj l<s<n-l 


where 


T = n max 

0<s<[dj 


Z=1 


- EX,;W 




4 













Since \am\ < CQm~^ for m > 1, by Lemma VI.l = 0{s~^) and for /3 > 1 and 

1 > /3 > 1/2, resp. Therefore, we have maxi<s<Ldj s| 7 s| = 0(1) or if /3 > 1 or 

1 > f3 > 1/2; and max;< 5 <„_i 17^1 = 0{l~^) or 0(/^“^^) if/3>lorl>/3> 1/2. By Lemma 

The ratio eonsisteney of 6 follows from the 


VL5 


T = Op(r 5 ). Then (22) follows from 


V.l 


assumption that R{6) > C > 0 and 


R{e)-R{e) = 7 T-V + 6> r6/-26> 7 


= 6>^r0 + 6> r6> 


2 e^Te 


= {0-eyT{e-0) 

< Ki\0-0\l. 


□ 


VI. Technical lemmas 

In this seetion, we prove the teehnieal lemmas that are used for Seetion |V] in this supplemental 
material. 

Lemma VI.l. Let (3 > 1/2 and {am)mez be a real sequence such that < Com~^ for m > 1 
and am = Oifm<0. Let jk = Ylm=o\^rnam+k\, 4 = \ak\Ak+i, where 

n(5^fc=i+l ■ Let bs^m Ond bs^m,m' T 

ai_rn'ai+s-m- Then (i) 7„ = 0{n~^) (resp. 0(n"Mogn), or 0{n}~‘^^)) and YJk=o'lk = 0(1) 

(resp. O(log^n), or 0(n^“^^)) hold for /3 > 1 (resp. 13 = 1, or 1 > 13 > 1/2); (ii) On = 
0(^-2/3+i/2); j YZk=o'lk = ^(1) (r^sp. O(logn), or 0(rA~^^)) and 6n = 0{n) (resp. 
0{n\og^n), orO(rA-'^f^))for{3 > ‘i/A(resp. (3 = 3/4, or3/4 > {3 > 1/2); (iv) Emez= 
0(n); (v)for q>2, 'Em'<m^^o<s<n \bs,m,m'\‘^ = 0(n) (resp. O(nlogn), or 0(n^Ai-‘^3)q)) far 
(3 > 1/2+1/(2q) (resp. ^ = l/2+l/(2g), or l/2+l/(2g) > (3 > 1/2),- (vi) Emez'^axo<s<„(Em'<m 
= 0 ( 0 ) (resp. 0{nlog^ n), or 0(rA~^^)) for (3 > 3/4 (resp. (3 = 3/4, or 3/4 > (3 > 1/2). The 
constants of O(-) only depend on Oq and (3 for (i)-(iv) and (vi), and they may also depend on 
qfor (v). 


VL3 


Lemma [VL 1 1 follows from elementary manipulations. The details are omitted. In Lemma [VL 2 
and 


VL4 


we assume that the linear proeess has mean-zero and Sn = n ^ YZi=i ^*^ 7 - 


5 










Lemma VI.2 (Sub-Gaussian). Let {^ij) be Ltd. satisfying Assume Q. Then for all x > 0 


P(|sjfc - ajk\ >x)< 2exp 


—C min 


X 


X 


^n,l3 'bn,f} 


(27) 


where Jn,/ 3 ) = {n ^,n ^), (n and {n^ for (3 > 1, 1 > /3 > 3/4 and 

3/4 > /) > 1/2, respectively, and C is a constant that only depend on {3, Cq in Q and in 


Proof Letr/= 


\T 


and 


= 


Aoj- 

Ai,j- 

A2,j- ■ 


A 

^n,j- 

0 

Ao,j- 

Aij- ■ 

^n-2,j- 

An-1,j 

• • o 

0 

Aoj- ■ 


An-2, j 

0 

0 

0 ■ 


Al,j- 




Observe that - Then nsjk = rf' Since are i.i.d. 

sub-Gaussian, by the Hanson-Wright inequality [[5^ Theorem 1.1], 

(28) 


P {\ri^[A^^'^y- E{r]'^{A^^^A^’‘'>r]) \ > x) 


< 2 exp < —C min 


|(^0-))T^W|-2^2^ ^((^0'))T^W)-^ 

where G is a constant independent of p, n and x. Let Then, T^^) has the 

same set of nonzero real eigenvalues as {A^l'>y Since 


, -1 


|(^0'))T^W1 


= tr 


ylO-)(yl(i))^yl(fc)(ylW)^l < |rW| IT^^) 


and 




the right-hand side of ( [28] ) is bounded by 

-C min 


< 2 exp 


X 


X 


maXj<p\T(A\y maXj<pp(Td)) J_ 

By the Cauchy-Schwarz inequality, we have 


( 29 ) 


:= Y1 


m=0 


oo / p 

m+l,j-\ — ^m,jk 

m=0 Vfc=l 


1/2 


1/2 


E 

V k=l 


m+ljk 
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By the decay condition and Lemma VI.l (i), we have = 0(l~^) if /? > 1 and = 
0(l^~^^) if 1 > j3 > 1/2 uniformly over j. Here and hereafter in this proof, the constants of 
O(-) only depend on Cq and 13. Also by Lemma 


VI.l 


_ < ^^70 + 2 Er=i - 07« < 

, which is of order 0{n) or 0(77,^““^^) for (3 > 3/4 or 3/4 > (3 > 1/2, respectively. 
Similarly, since = ^(1) for /3 > 1 or 1 > /3 > 1/2, 

respectively. Now, (jST]) follows from Lemma |VL1| and 


□ 


In the following Lemma VL3| and |VL4 , without loss of generality, we may consider the mean- 
zero linear process Xi := Xu = X]m=o where am is the first row of Am such that in 

accordance to (|^, la^l < Co(l V m)~^, for m > 0,/9 > 1/2, and = 0 if m < 0. Let 
Sn = n-1 Er=i and = EX/. 

Lemma VI.3 (Polynomial moment). Let q > 2 and be i.i.d. random variables such that 
< oo. Assume ^ holds. Let = max(||^/ 1 -1||5, ||^i,i||29). Then (i) If (3 > l-l/(2g), 
then we have for all x > 0 


/io,gV||6,i||g 


Pd^n-a^l >X)<C 
(ii) If 1 - l/(2g) > /) > 1/2, then 
P(|^n-crd >x)<C 


nq 


+ exp 


Cnx"^ 


do ,2 V 11^ 


dO,q 


+ 




flQ—^X'l 77,29(2/3—l)^2g 

where the constants C and C only depend on q,l3 and Cq. 


+ exp 




Cnx^ 
do ,2 


(30) 


(31) 


Proof. In this proof, the constants and the constants in O(-) only depend on Co, (3 

and q. Let = Yh=i where 


m—1 


^ ^ ^ ^ ^rn^i—rn^i—rn'^rn^ ^ ^ ^ ^ ^i—rn^rn^rn'^i—rn'’ 


m£7j m'=m-\-l 


mEZ m '=—00 


Let Zm = ^m^m ~ bc lid random matrices in Write Sn = Ln + 2Qn, where 

n n 

Ln ^ ^ ^ ^ ^ ^ dll^ZmBm') i Bm ^ ^ 


i=l mSZ 


mSZ 


i=l 


1 
















Since G Z are independent random variables with mean zero, by Corollary 1.7 in 

(531, we have for all a; > 0 


^{\Ln\ >x)< Cr 






+ 2 exp 


C2X^ 




Note that 


E\triZmBm)\^<C^ 


-1 


E 


^ ^ ^m,ssBm,s 


s=l 


+ E 


'y ''j 'y ''j ^m,stByn,st 


S=1 t<s 


Sinee {irn,s Y.t<s Bm,st^m,t)s=i,-,p is a martingale differenee sequenee w.r.t. ■ ■ ■ , 

we have by Burkholder’s inequality (541 

p p 

(32) 


II < {q-l)J2^LssUlo-m 

S = 1 

< (9-1)" ^^<,,116 


o.ollg- 


(33) 


S=1 t<s 


S=1 t<s 


Therefore, it follows that E|tr(Zmi?m)|'^ ^ f^o,q\Bm\F- By the Cauehy-Sehwarz inequality and 
0, we have 

'—^ n n 

l^mlF < \sii-m? = 0(^{i - m)“^^) if i > m, 

i=l i=l 

and |i?m|F = 0 if i < m. Simple ealeulations show that, e.g. see the proof of Theorem 1 in (55l . 
Ylm&z \Bm\%' = 0{n) for g > 2 and (3 > 1/2. Therefore, we have 

P(|Z„| >a:) <C'i^ + 2exp('-^') . (34) 

Next, we deal with (3„. Let ll'),, ^ A,j ^ lltj - Wij-i and Qij = 

EU. Wtj- Let 0 = To < Ti < ■ ■ • < Ti = n be a subsequenee of {1, • ■ ■ ,n}, where ti = 
2/ 1 < / < T — 1 and L = [log 2 n\ . Sinee Qn,o = Y^^=i = 0’ we have the deeomposition 


Qn Qn Qn,n T 


l=l 


n,Tl 


Q 


n,Tl-i) 


For eaeh j > 0, we have Dij = Yl’m=i-j+i and 

ifi — j + l<fc<i 


T’kDij — 


otherwise 
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where Vki,-) = E(-|^fc, ■ ■ ■) - ^^. 2 , • ■ ■) is the projection operator on By 

Burkholder’s inequality, we have 


\\Qn-QnA\l, < (2g-i) 

k=—o 

n 

= (2q -1)5] 

k=—o 

n 

= (29 - 1 ) 5 ] 


oo n 


j=n-\-l i=l 


2q 


OO n 


EE ^j^i-j^i—k^k'^(k<i<k-\-j-l) 

j=n-\-l i=l 


2q 


k=—oo 


n—j 


^ ^ ^k ^ ^ ^m,+j-k^j^mX{^-j<rn<k-l) 

j=n-\-l m=l—j 


2q 


By Fubini’s theorem, 


oo n—j 


-1 n—m —n—1 n—m 

E E =E E + E E- 

j=n+lm=l—j m=—nj=n^l m=— 00 ^= 1 —m 

Thus, we get \\Qn - Qn,n\\ 2 q < (^i where 

|2 „ _ 


= E 


k=—oo 


-1 


^ ^ ^k^^rnk$,mX 


(m<k—l) 


= E 


k=—oo 


m=—n 

—n—1 


^ ^ $,k ^2mk$,m'^(m<k 


<k-l) 


m=—oo 


, Blmk — ^ ^ ^rn+j-k^j'^ij>k-m), 

j=n+l 

n—m 

1 B2mk = 'y ^ ^rn+j-k^j'^{j>k-m)- 


2q 


j=l-m 


First, we tackle T 2 . For i = 1,2, observe that Bimk^m)m=- ,k- 2 ,k-i are backward martingale 
differences w.r.t. a{^^, ■ ■ ■ , ^k)- Using Burkholder’s inequality twice and by the Cauchy-Schwarz 
inequality, we have 


n —n—1 


T 2 < (2g - 1 ) MjB2mk^jn'^(m<k-l)\\lq 

k=—oo m =—00 

n —n—1 

< (2g - l)^||,^0,o||2g \B2mk\F'^{m<k-l) 


k=—oo m=—oo 

n —n—1 / n—m 

< (29-i)^ii&,o||J, 5] E E I® 

k=—oo m=—oo \j=l—m 


■m+j—k\ ' I11 m) 
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Therefore, by Q, 
T 2 


n —n—1 / n—m 


(2? - l)=||&.o||J, 


~ Y1 X] X] 2 - m) + 1) ” 1 


{m<k—l) 


k=—oo m=—oo \j=l—m 
—n k—1 


< T. 

k=—oo m=—oo 

0 —n—1 n 

+ Y1 (1 - 


j=i 


k=—n+l m =—00 

_ yy _ yy 

+5^ ^ (k-mr-Ki^u-k+irfy. 

k=l m=—oo j=k 

By Karamata’s theorem and some elementary manipulations, we have 


To = 


0(116, 

O(log^(n)) if /3 — 1 

, 0(||6,.ll^,n-‘-‘‘^) if 1 > /? > 1/2 
For Ti, we apply a similar argument and it obeys the same bound as in T,. Therefore, we have 

0(|l6,illi,n‘-*) if/3>l 

0(log‘‘''^(n)) if ^ = 1 

0(116,1 Hi,""-"'’) ifl>/3>l/2 


IIQn Qn,fi||2g ^ 


and by Markov’s inequality 

ElQn - 




X 


2q 


^(11^1,11125^^*'^ if /3 > 1 

= ^ 0(116,i||2glog^'^(n)a;“29) if /? = 1 

0(116,if 1 > > 1/2 


Now, we deal with Qn,Ti — Qn,Ti_i- Fix an / = 1, • • ■ , L and let r = [n/r/] and = {1 + (r — 
1)ti, ■ ■ ■ , (rri) A n} be the r-th block of {1, • • • , n} for 1 < r < f. Let 

j = Ti_i + l ieBr 

Since Dij is j-dependent for all i, it follows that are independent and so are 

11,4, • ■ ■ ■ Let Xi = (b/vr^)/”^, 1 < Z < L. So J2f=i ^ L Fy Corollary 1.7 in ||5^ . we have 

C2\W 


IP(|Qn,n - Qn,Tl_^ > 2AzX) < Cl 


e:=i iih.^ 


2q 


A;"a:29 


+ 4 exp 


e:=i iih. 


T II 2 
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We need to bound ||y/,r|| 2 q- It suffices to consider the first block r = 1. By a similar argument 
as in bounding \\Qn — Qn.nllL^ we have by Fubini’s theorem ||Fi,r|| 2 g = ^((Ts + T 4 )^/^), where 



n 


2 

Ti-m 


Ts 

= E 

^ ^ ^3mk^rn^im<k—i) 


Btimk ^ ^ 

^m+j - fc (i > fc- m) 


k= — OQ 

m=0 

2g 

i='r!-i + l 



n 

_1 

2 

n 


T 4 

= E 


y 

B^rnk 'y ^ ‘ 



k= — OQ 


2q 

i=n-i+i 



By Burkholder’s inequality and Karamata’s theorem, we get 

Ts 


Ti-m 


(29 - l)^ll&,oll^, 


T; 

. s Z Z I E 




k=l m=0 + l 

( 


= < 


0 (r^_iV) if/3>l 

O(log 2 (r 0 ) if/3 = l 


0 (r-frf-^'') ifl>/3>l/2 


and 


T 4 


-1 


(29-l)"ll&,olli, 


4 < 


E E E 

k=-ri_i + l Tn=-Ti_i \j=Ti_i-\-l 


^j\ ' \^m-\-j—k\-^{j>k—m>l) 


It., 


if P>1 

= < 0{log\n)) if/3 = l 

if 1 > /3 > 1/2 

Since T 4 = 0{T^), we have: if /3 > 1, then 


i,Ti Qn,Ti_i \ — 2A;a:) E Ci' 


Af‘^a;29 


+4 exp 


C'sAfa;^ 


if 1 > /3 > 1/2 




Qn,T,_il > 2A;a;) < Ci 






+4 exp 


C'sAfa;^ 
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Case I: j3 > 1. We have 


P(I0„I>3X) < ^ ^ 

1=1 


X2q 

+ min < 4 exp 
1=1 


X2q 

Csx^ 




A? 





, 1 • 


For I > 1, with the ehoiee of r; and A;, we have 


and 



36 

4 / 371-4 


2(2/3-l)/-41og2/ > > 0 


2{2q-l-2q^)lliq ^ ^ 


beeause 2q — 1 — 2q(3 < — 1. Therefore, 

f ^ 

min < 4 


1=1 



Henee, we obtain that 

P(|^n-cr^| >x)<Ci 

Now, we may assume that x > for some eonstant Cq, beeause otherwise the inequality 

(30) is trivial. Then, n^~‘^^x~‘^'^ < Cqn^~^x~^, from whieh (30) follows. 

Case II: (5 = 1. We have 

P,IQ„I > 3., < p c «t 


X2q 

L 


+ min < 4 exp 


1=1 


X2q 

C^x^ 


1=1 




2L-\ 


116,1 Il2^r; 6og^(r/)/ ’ 


By similar argument as in Case I, we obtain (30). 

Case III: The long-memory ease with 1 > (5 > 1/2. We have 


X2q 

L 


+ min < 4 exp 


X2q 


1=1 


A?'' 


A? 


1=1 


Il6,i|l23^r; ^)2 / ’ 
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For / > 1, we have 
(i) If 3/4 < /3 < 1, 


AfrS /eV 


3-2/3 


36 


TT^ / 4'®7r^ 


2(4/3-3)/-41og2/ > 02 > 0 


and 


L 2g(2-/3)-l 

_ 

2qP^2q 

1=1 


< 

rsj 


logjn 

;=i 


= 


0(1) if 1 > /3 > 1 - l/(4g) 
0(log^'^^^(n)) if /3 = 1 — l/(4g) 

if 1 - l/(4g) >/3 > 3/4 


Henee, we obtain that 

P(|5„-a2| >x)<Ci 


F-o.i 




4(l_/3)g||^M_ll_^ 


+ exp 


C2nx^ 


n‘i~^x^ ^ V /^o,2 V 11^14112, 

We may assume that x > CqXT^^^I^ for some eonstant Og, beeause otherwise the inequality 
(30) is trivial. Then, if /3 > 1 — l/(2g), < Cq'n}~'^x~'^, and then (30) follows, 

(ii) If 3/4 > f3 > 1/2, then by a similar argument for proving the bounds on Ti and T 2 
terms, we ean show that ||Qn,n|| 2 g obeys the same bound as \\Qn — Qn,n\\ 2 q, i-O- ||Qn,n|| 2 g = 


Odl^i^iWlqTi^^^ By Markov’s inequality. 


P(|Qn| >x)<Ci 

Combining this with (34), we have ([3T|). 


n 


4(1- 


«-'ll?ull2 


X2q 


□ 

Lemma VI.4 (Sub-exponential). Assume {^ij) are Ltd. random variables satisfying a > 1/2. 
Let {]' = min(l/2, 2/3 — 1) for (3 > 1/2. Then we have for all x > 0 

P(|^n - cr^l > a:) < Oexp [-0'(n^'a:)^ 

where the constants C and C only depend on a,(3,Co in ([^ and la 0 

Proof First, eonsider the quadratie eomponent Qn = Yl/=i^i- Let 6^ = \aik\Ak+i and Al = 
Ylm=k for k > 0. Put 6*fc = 0 if fc < 0. By Lemma VI.1 6k < . Note that 


(35) 
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^kQni k 




l,n, are martingale differenees. Since VoWi = we 


have by [I56l Theorem l(i)], Burkholder’s inequality [I54l . and Lemma VI. 1 

LXJ / \ ^ 

IIQ ^ ^ 


'n\\q ^ 


oo / 2+n 

I'' - 1) E E 

i=—n \k=i+l 


< (9-1)"E 


i-\-n 


1 / 2 ' 


-I 2 




II' 

•mllg' 


fc=2+l 


2+n 


^^rn=l 


(36) 


< (79*“+* E E «* ) S C'g*“+*H^, 

i=—n \k=i+l / 

where [/„ = if /? > 3/4 and [/„ = if 3/4 > /3 > 1/2. Therefore, HQnllg < 
for O' > 2. Let A = l/(2a + 2). By Stirling’s formula, we have 


lim sup 

q—^oo 




etC^Xq 


(,!)*/. ^'T21%(2.,)*/M 


= eXtC^ < 1 , 


for 0 < f < (eAC^) ^ Thus, for sufficiently large go = qo{a), EJlqo ^QnWil/q'- < 
the exponential Markov inequality and Taylor’s expansion e’’ = YlT=o'^‘^/q^--’ we have 


> x) < exp(-te^/L/’)Eexp[t|LE Qn| ] < Cexp(-te^/t//'). 

The linear component follows from similar lines with the difference that ||tr(Zmi?m)||g = 


0{q^°‘^‘^\BmW)', see (32) and (33). Therefore, we get 

P(|.Sn — cT^I > x) < (7 exp —C'm.m 2“+2 , (n^'^^x) 4“+3 


Assume that n^'x > 1 because otherwise we can choose C large enough to make (35) trivially 
hold. Then,(?7,^/^x)^/*^"^“’''^^ > {pP 'and (35) follows. □ 


Next, we prove a maximal inequality for the auto-covariances of a univariate linear process. 


Lemma VL5. Suppose that Xi is a univariate linear time series (18) such that H^ollg < oo,q > A. 
Let 1 < J < n and 

n—s 

T = n~^ max | - E(XiVi+^))|. 


i=l 


Then we have 

T = (9p((log J)n"^'||^o||q), where /?' = min(l/2, 2(3 - 1) for /3 p 3/4. (37) 
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Proof. Let “ 1)’ where bs,m = o^i-mai+s-m- By [|57l Lemma 8], we 

have 

E max \Ls\ < ||^o “ l|l (max ^ 6^ ] ^\ogJ + ^Efinax maxfc^ - 1)^]^ log J 

0<s<J \ 0<s<J I \ 0<s<J m£Z J 

\ mGZ / ^ ^ 


< 11^0^-11 


max 

0<s<J 


V^+(y1 

m£Z J VmgZ 


1/2 


max62 iQgj 


0<s<J 


:£ ll?Li|lfe 

VmEZ 


1/2 


max 62 iQgj_ 


Q<s<J 


By Lemma VI.l and Markov’s inequality, we have 


max \Ls\ = Op{\\^Q - l\\n^^^\ogJ). 

0<s<J 


(38) 


Let 6^ ,^ ,^/ Xji=l consider y bs.m.mfmCm'- 


By the randomization inequality [I581 Theorem 3.5.3], 


E( max |Q = |) < E max 

0<s<J 




m' <ra 


where are i.i.d. Rademacher random variables independent of .^m’s. Let the triangle matrix 
S = {bs,m,mfm^m')m'<m- Sincc Em’s aTC sub-Gaussian, by the Hanson-Wright inequality 
Theorem 1.1] conditionally on ^ = {^rn)mez, we have 

t2 t 


^ ^ ^rn^rn'bs,m,rnfrn^rn,'\ ^ ^ | ^ 2 exp 


m'<m 


-C min 


12 ’ 




Then, it follows from integration-by-parts and Pisier’s inequality [|59i Lemma 2.2.2] that 

E( IQ.I) ^ (log J)^, where I = E( mi^ ^ bl 

0<s<J 0<^<-i ^ 

By the triangle inequality. 


m' <m 


m'<m 


(39) 


where 


II 


E 


max 

0<s<J 



m' <m 





Ill 


E 


max 

0<s<J 



m' <m 


1 ) 
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Since ~ “ 1) is a Completely degenerate 17-statistic, by the random¬ 

ization inequality [[5^ Theorem 3.5.3], the above argument and the Jensen and Cauchy-Schwarz 


inequalities, we obtain that 


II <E 


max 

0<s<J 


E 

m'<m 




1/2 


< 


(log J)E max 

0<s<J 


/,4 M c- 

/ j ^s,m,m'SmS55 


jn'<m 


< {iogj)V^Vi, 


where B = maxo<s<jmaXm'<m&^,m,m'CC'- By lET] Lemma 8], we have 


E 


max 

0<s<J 


1N 

s,m,m' \^m J 


m'<m 


< 


i/log J 


max 

0<s<J 


E(E^: 

.mgZ m'<m 


2 ',2 
s,m,m') 


1/2 




where B' = maxo<s<jmaXm6z(I]m/<m Now, solving the quadratic inequality 


(39), we have 


J < (log Jl^EU -I- max b 

^ ^ 0<s<J ^ 


2 

s.m.ra' 


m'<m 


+ (log J) 


E„?s(E‘E™') 

mEZ \m'<m / 


1/2 


Var‘/2(g) 


By Lemma VI. 1 


EB < \\B\\q/2 < ( Y1 = < 


^(n^ll^ol 


if /^ > I + ^ 

1 I 1 


m'<m 


O(nt(logn)9||^o||g) if/5 = i + ^ 




m' <m 


and 


E„?s E 

rnGZ \m'<m 


0{n) if (3 > 3/4 

if 3/4 > /? > 1/2 

(9(n) if /3 > 3/4 

(7(11^-®^) if 3/4 > /3 > 1/2 
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Since log J = O(logn) and g > 4, it follows that 


0<s< J 


(log J)n^^‘^Uo\ 


if /3 > 3/4 


(logJK-2/3||eo||^ if3/4>/3>l/2 

Combining this with (38), we have (|37]). 


□ 
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